DNA glycosylases and their use

ABSTRACT

Novel cytosine-, thymine- and uracil-DNA glycosylases, subcellular localization peptides, nucleic acid molecules containing the same, methods of identifying such enzymes and their use in various methods including mutagenesis, cell killing and DNA sequencing and modification, are described.

This invention relates to new DNA-glycosylases, in particular newcytosine-, thymine- and uracil-DNA glycosylases, and their use formutagenesis, for NA modification and cell killing.

Damage to DNA arises continually throughout the cell cycle and must berecognised and repaired prior to the next round of replication tomaintain the genomic integrity of the cell. DNA base damage can berecognised and excised by the ATP-dependent nucleoside excision repairsystems or by base excision repair systems exemplified by the DNAglycosylases.

DNA glycosylases are enzymes that occur normally in cells. They releasebases from DNA by cleaving the bond between deoxyribose and the base inDNA. Naturally occurring glycosylases remove damaged or incorrectlyplaced bases. This base excision repair pathway is the major cellulardefence mechanism against spontaneous DNA damage.

DNA glycosylases which have been identified are directed to specificbases or modified bases. An example of a DNA glycosylase whichrecognizes an unmodified base is uracil DNA glycosylase (UDG), whichspecifically recognises uracil in DNA and initiates base excision repairby hydrolysing the N-C1′ glycosylic bond linking the uracil base to thedeoxyribose sugar. This creates an abasic site that is removed by a5′-acting apurinic/apyrimidic (AP) endonuclease and adeoxyribophosphodiesterase, leaving a gap which is filled by DNApolymerase and closed by DNA ligase.

The activity of UDG serves to remove uracil which arises in DNA as aresult of incorporation of dUMP instead of dTMP during replication orfrom the spontaneous deamination of cytosine. Deamination of cytosine touracil creates a premutagenic U:G mismatch that, unless repaired, willcause a GC→AT transition mutation.

In vivo, UDGs specifically recognise and remove uracil from within DNAand cleave the glycosylic bond to initiate the uracil excision pathway.In vitro, UDG's can recognise and remove uracil from both singlestranded DNA (ssDNA) and double-stranded DNA (dsDNA) substrates.

UDGs are ubiquitous enzymes and have been isolated from a number ofsources. Amino acid sequencing reveals that the enzymes are conservedthroughout evolution with greater than about 55% amino acid identitybetween human and bacterial proteins. A cDNA for human UDG has beencloned and the corresponding gene has been named UNG (Olsen et al.(1989) EMBO J., 8: 3121-3125).

The crystal structures of the human enzyme (Mol et al., (1995) Cell, 80:869-878) and the herpes simplex virus enzymes (Sava et al. (1995)Nature, 373: 487-493) have recently been determined and reveal thaturacil binds in a rigid pocket at the base of the DNA binding groove ofhuman UDG. The absolute specificity of the enzyme for uracil over thestructurally related DNA bases thymine and cytosine is conferred byshape complementary, as well as main chain and side chain hydrogenbonds.

Although UDG's do not have activity against other bases as a result ofthe afore-mentioned specific spatial and charge characteristics of theactive site, other glycosylases with different activities have beenidentified, which may or may not be restricted to single substrates.

A naturally-occurring thymine-DNA glycosylase has been identified whichin addition to releasing thymine also releases uracil (Nedderman &Jiricny (1993) J. Biol. Chem., 268: 21218-21114; Nedderman & Jiricny(1994) J. Proc. Natl. Acad. Sci. U.S.A., 91: 1642-1646). Thisthymine-DNA glycosylase however has activity in respect of only certainsubstrates and has an absolute requirement for a mismatched U or Topposite of a G in a double-stranded substrate and will not recognise Tor U from T(U): A matches or a single-stranded substrate. DNAglycosylases which recognize and release unmodified bases other thanuracil and thymine (in certain substrates, as mentioned above) have notbeen identified.

A DNA glycosylase recognizing unmodified cytosine has not been reported,although a 5-hydroxymethylcytosine-DNA glycosylase activity was detectedin mammalian cells (Cannon et al. (1988) Biochem. Biophys. Res. Comm.,151: 1173-1179). The sequences of the afore-mentioned thymine and5-hydroxymethylcytosine DNA glycosylases have not yet been reported andit is unknown whether their active site may be structurally related toUDG.

It has now surprisingly been found that the substitution of certain ofthe UDG amino acids has a profound effect on the substrate specificityof the glycosylase. In particular, the replacement of Asn204 by Asp204results in the production of a mutant enzyme which has acquiredcytosine-DNA glycosylase (CDG) activity, while retaining someUDG-activity. Alternatively, replacing Tyr147 with Ala147 allows forbinding of thymine, resulting in an enzyme that has acquired thymine-DNAglycosylase (TDG) activity.

These new DNA glycosylases are not product-inhibited by added uracil, incontrast to UDG and other UDG-mutants. Compared with the efficiency ofwild type UDG in removal of uracil, the activity of the new DNAglycosylases that remove normal pyrimidines in DNA is low, but distinctand easily detectable. However, it should be noted that the very highturnover of UDG appears to be unique among DNA glycosylases and turnovernumbers of other DNA glycosylases may be as low, or even lower thanthose of the engineered glycosylases CDG and TDG. This may result fromthe narrow substrate specificity of UDG.

Furthermore, an additional new UDG has been identified. The completesequence of the UNG gene was recently published (Haug et al., 1996,Genomics, 36, p408-416). As mentioned previously, cDNA to this UNG genehas been identified by Olsen et al., 1989, supra (hereinafter referredto as UNG1 cDNA and the expressed protein referred to as UNG1). Otherworkers have described the location, gene structure and recombinantexpression of UNG1 (Slupphaug et al., 1993, Nucl. Acids Res., Vol. 21,No. 11, p. 2579-2584; Haug et al., 1994, FEBS Letters, 353, p. 180-184;Slupphaug et al., 1995, Biochemistry, 34, p. 128-138, respectively). Ithas now surprisingly been found that alternative splicing of the genomicDNA (UNG) with an exon located 5′ of exon 1 which was not previouslyrecognized results in a new distinct cDNA with an open reading frame of313 amino acids. The new UNG CDNA is referred to hereinafter as UNG2cDNA, and the product which it encodes, UNG2. The latter protein has apredicted size of 36 kDa.

UNG2 differs from the previously known form (UNG1, ORF 304 amino acidresidues) in the 44 amino acids of the N-terminal presequence, which isnot necessary for catalytic activity. The rest of the presequence andthe catalytic domain, altogether 269 amino acids, are identical. Thealternative presequence in UNG2 arises by splicing of a previouslyunrecognized exon (exon 1A) into a consensus splice site after codon 35in exon 1B (previously designated exon 1). The UNG1 presequence startsat codon 1 in exon 1B and thus has 35 amino acids not present in UNG2.Coupled transcription/translation in rabbit reticulocyte lysatesdemonstrated that both proteins are catalytically active. Similar formsof UNG1 and UNG2 are expressed in mouse which has an identicalorganization of the homologous gene. Furthermore, the presequence of aputative Xiphophorus UNG2 protein predicted from the gene structure ishomologous to mammalian UNG2, but much shorter, suggesting a very highdegree of conservation from fish to man.

The invention therefore provides a DNA glycosylase capable of releasingcytosine bases from single stranded (ss) DNA and/or double stranded (ds)DNA or thymine bases from both single stranded (ss) DNA and doublestranded (ds) DNA or from single stranded (ss) DNA or uracil bases fromsingle stranded (ss) DNA and/or double stranded (ds) DNA, wherein saiduracil-DNA glycosylase is encoded by a nucleic acid molecule comprisingthe sequence (SEQUENCE I.D. No 1):

1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 61CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 121GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 181TGAGGAAAGC GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 241GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 301CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 361CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 421TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 481GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCCATATCA 541TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 601CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 661CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 721GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 781AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTA 841TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 901TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 961TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGC 1021TGAGGGGTGG CCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 1081AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AGAAAGCAGC 1141CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 1201TTTGACCAAA TGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 1261AGGTCAAATA CTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 1321TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 1381AGACTTTCAG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 1441TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 1501CTCAGGCCCC TCGCAGCATA AGGATGTTTT GCAACTTTCC AGAATCTGGC CCAGAAATTA 1561GGGCTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 1621ACCGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 1681TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGTCTTTAT GGGGTGGGGT GATTTTCTCC 1741TAGGGTTATG TCCAGTTGGG GTTTTTAAGG CAGCACAGAC TGCCAAGTAC TGTTTTTTTT 1801AACCGACTGA AATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 1861AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 1921GTCTTTATTG GGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 1981GTCGGGTTAG AGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 2041AAAAAAAAAA AAA

or a fragment thereof encoding a catalytically active product comprisingat least nucleotides 121 to 130, preferably 71 to 202 in addition to thecatalytic domain, or a sequence which is degenerate, substantiallyhomologous with or which hybridizes with at least nucleotides 121 to130, preferably 71 to 202 of any such aforesaid sequence.

In particular, viewed from one aspect, the invention can be seen asproviding a cytosine-DNA glycosylase (CDG) capable of releasing cytosinebases from ssDNA and/or dsDNA.

A further aspect of the invention provides a cytosine-DNA glycosylase(CDG) capable of releasing both cytosine and uracil bases from ssDNAand/or dsDNA.

Preferably, the cytosine-DNA glycosylase is one derived from a UDG andespecially from the human UDG protein which has Asn at amino acidposition 204. In particular, the novel CDG of the invention ispreferably derived from human UDG and has an amino acid substitution ormodification at position 204. Modification of UDG from other species atan equivalent residue is similarly preferred. Especially preferably, theglycosylase is human UDG having an aspartic acid residue (Asp) atposition 204.

Another aspect of the invention provides a thymine-DNA glycosylase (TDG)capable of releasing thymine bases from both ssDNA and dsDNA.

A further aspect of the invention provides a thymine-DNA glycosylase(TDG) capable of releasing both thymine and uracil bases from both ssDNAand dsDNA.

Yet further aspects of the invention provide a thymine-DNA glycosylase(TDG) capable of releasing thymine bases from A:T DNA pairs and athymine-DNA glycosylase (TDG) capable of releasing thymine bases fromsingle stranded DNA.

Preferably, the thymine-DNA glycosylase is one derived from a UDG, andespecially from the human UDG protein which has Tyr at amino acidposition 147. In particular, the novel TDG of the invention ispreferably derived from human UDG and has an amino acid substitution ormodification at position 147. Modification of UDG from other species atan equivalent residue is similarly preferred. Especially preferably, theglycosylase is human UDG having a alanine residue (Ala) at position 147.

A yet further aspect of the invention provides a uracil-DNA glycosylaseencoded by a nucleic acid molecule comprising the sequence (SEQUENCE I.DNo 1)

1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 61CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 121GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 181TGAGGAAAGC GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 241GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 301CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 361CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 421TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 481GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCCATATCA 541TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 601CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 661CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 721GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 781AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTA 841TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 901TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 961TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGC 1021TGAGGGGTGG CCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 1081AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AGAAAGCAGC 1141CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 1201TTTGACCAAA TGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 1261AGGTCAAATA CTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 1321TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 1381AGACTTTCAG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 1441TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 1501CTCAGGCCCC TCGCAGCATA AGGATGTTTT GCAACTTTCC AGAATCTGGC CCAGAAATTA 1561GGGCTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 1621ACCGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 1681TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGTCTTTAT GGGGTGGGGT GATTTTCTCC 1741TAGGGTTATG TCCAGTTGGG GTTTTTAAGG CAGCACAGAC TGCCAAGTAC TGTTTTTTTT 1801AACCGACTGA AATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 1861AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 1921GTCTTTATTG GGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 1981GTCGGGTTAG AGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 2041AAAAAAAAAA AAA

or a fragment thereof encoding a catalytically active product comprisingat least nucleotides 121 to 130, preferably 71 to 202 in addition to thecatalytic domain, or a sequence which is degenerate, substantiallyhomologous with or which hybridizes with at least nucleotides 121 to130, preferably 71 to 202 of any such aforesaid sequence. Preferablysuch degeneracy, homology or hybridization applies to the entiresequence.

The above nucleic acid molecule encodes a protein having the amino acidsequence as indicated below (SEQUENCE I.D. Nos 1 and 2):

1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 61CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG           M  I  G   Q  K  T   L  Y  S  F   F  S  P   S  P  A 121GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCCR  K  R  H   A  P  S   P  E  P   A  V  Q  G   T  G  V   A  G  V 181TGAGGAAAGC GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGAP  E  E  S   G  D  A   A  A  I   P  A  K  K   A  P  A   G  Q  E 241GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAAE  P  G  T   P  P  S   S  P  L   S  A  E  Q   L  D  R   I  Q  R 301CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAGN  K  A  A   A  L  L   R  L  A   A  R  N  V   P  V  G   F  G  E 361CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATTS  W  K  K   H  L  S   G  E  F   G  K  P  Y   F  I  K   L  M  G 421TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTGF  V  A  E   E  R  K   H  Y  T   V  Y  P  P   P  H  Q   V  F  T 481GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCCATATCAW  T  Q  M   C  D  I   K  D  V   K  V  V  I   L  G  Q   D  P  Y 541TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCCH  G  P  N   Q  A  H   G  L  C   F  S  V  Q   R  P  V   P  P  P 601CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGGP  S  L  E   N  I  Y   K  E  L   S  T  D  I   E  D  F   V  H  P 661CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCACG  H  G  D   L  S  G   W  A  K   Q  G  V  L   L  L  N   A  V  L 721GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGCT  V  R  A   H  Q  A   N  S  H   K  E  R  G   W  E  Q   F  T  D 781AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTAA  V  V  S   W  L  N   Q  N  S   N  G  L  V   F  L  L   W  G  S 841TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCAY  A  Q  K   K  G  S   A  I  D   R  K  R  H   H  V  L   Q  T  A 901TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAAH  P  S  P   L  S  V   Y  R  G   F  F  G  C   R  H  F   S  K  T 961TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGCN  E  L  L   Q  K  S   G  K  K   P  I  D  W   K  E  L 1021 TGAGGGGTGGCCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 1081 AAATTTTCCTATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AGAAAGCAGC 1141 CATGAACCAGGCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 1201 TTTGACCAAATGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 1261 AGGTCAAATACTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 1321 TGTGCACCTTTCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 1381 AGACTTTCAGTGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 1441 TGCAGGGCAGGCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 1501 CTCAGGCCCCTCGCAGCATA AGGATGTTTT GCAACTTTCC AGAATCTGGC CCAGAAATTA 1561 GGGCTCAATTTCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 1621 ACCGAGAGAAGTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 1681 TTTGAACAAGGAAGAGGAGA AAAGGGAATT TTGTCTTTAT GGGGTGGGGT GATTTTCTCC 1741 TAGGGTTATGTCCAGTTGGG GTTTTTAAGG CAGCACAGAC TGCCAAGTAC TGTTTTTTTT 1801 AACCGACTGAAATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 1861 AAGAAGTACTCATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 1921 GTCTTTATTGGGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 1981 GTCGGGTTAGAGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 2041 AAAAAAAAAAAAA

“Catalytically active product” as used herein refers to any productencoded by said sequence which exhibits uracil DNA glycosylase activity.

“Substantially homologous” as used herein includes those sequenceshaving a sequence homology of approximately 60% or more, eg. 70% or 80%or more, and also functionally-equivalent allelic variants and relatedsequences modified by single or multiple base substitution, additionand/or deletion. By “functionally equivalent” in this sense is meantnucleotide sequences which encode catalytically active polypeptides, ie.having uracil DNA glycosylase activity.

Sequences which “hybridize” are those sequences binding undernon-stringent conditions (eg. 6×SSC 50% formamide at room temperature)and washed under conditions of low stringency (eg. 2×SSC, roomtemperature, more preferably 2×SSC, 42° C.) or conditions of higherstringency (eg. 2×SSC, 65° C.) (where SSC=0.15M NaCl, 0.015M sodiumcitrate, pH 7.2). Generally speaking, sequences which hybridize underconditions of high stringency are included within the scope of theinvention, as are sequences which, but for the degeneracy of the code,would hybridize under high stringency conditions.

The significance of the UNG1, UNG2 presequence has also beeninvestigated in the present invention, by the use of constructs thatexpress fusion products of UNG1 or UNG2 and green fluorescent protein(EGFP). Surprisingly, significant effects on subcellular targeting wereobserved and after transient transfection of HeLa cells, thepUNG1-EGFP-N1 product co-localized with mitochondria whereas thepUNG2-EGFP-N1 product targeted exclusively to nuclei. Whilst not wishingto be bound by theory, it appears that these sequences may beinstrumental in the localization of the enzymes. The putative nuclearsignal was identified as RKRH which also appears in the catalytic domainof both UNG1 and UNG2. Whilst it was recognized previously by Slupphauget al., 1993, Nucl. Acids Res., 21(11), p2579-2584, that the signal formitochondrial translocation resides in the UNG1 presequence, it wasbelieved that the signal for nuclear import lay within the matureprotein as in the absence of the presequence, UNG1 was transported tothe nucleus. However, UNG2 has now been identified which has apresequence and which localizes to the nucleus. These presequence thushave utility for directing the subcellular localization of moleculesattached to them.

Thus, viewed from a further aspect, the invention provides nuclearlocalization peptides encoded by a nucleic acid molecule comprising thesequence (SEQUENCE I.D. Nos 3 and 4):

ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG  M  I  G   Q  K  T   L  Y  S  F   F  S  P   S  P  A GAAGCGACACGCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCCR  K  R  H   A  P  S   P  E  P   A  V  Q  G   T  G  V   A  G  VTGAGGAAAGC GGAGATGCGG CG P  E  E  S   G  D  A   A

or a fragment thereof encoding a functional equivalent or a sequencewhich is degenerate, substantially homologous with or which hybridizeswith any such aforesaid sequence.

Functionally equivalent fragments refer to products which may serve asappropriate localization peptides. Especially preferred nuclearlocalizing peptides are those which include the amino acid sequenceRKRH.

A further preferred feature of the invention comprises DNA glycosylasesof the invention which additionally comprise at least one of theaforesaid nuclear localization peptide sequences or at least onemitochondrial localization peptide sequence encoded by a nucleic acidmolecule comprising the sequence (SEQUENCE I.D. Nos 5 and 6):

ATGGGCGTCT TCTGCCTTGG GCCGTGGGGG TTGGGCCGGA AGCTGCGGAC GCCTGGGAAG  M  G  V   F  C  L   G  P  W  G   L  G  R   K  L  R   T  P  G  KGGGCCGCTGC AGCTCTTGAG CCGCCTCTGC GGGGACCACT TGCAG  G  P  L   Q  L  L   S  R  L  C   G  D  H   L  Q

or a fragment thereof encoding a functional equivalent or a sequencewhich is degenerate, substantially homologous with or which hybridizeswith any such aforesaid sequence, e.g. CDG or TDG with a localizationpeptide. Such a composite may be prepared for example by appropriatemodification of UNG1 or UNG2.

The novel DNA glycosylases of the invention conveniently may be obtainedby modification of existing DNA glycosylase enzymes, such as the humanUDG mentioned above. Such modification, for example by replacement,addition or deletion of one or more amino acid residues, or indeedchemical modification of amino acid residues, may readily be achievedusing methods well known in the art and include modifications both atthe protein level and also at the level of the encoding nucleic acid.For example, site-directed mutagenesis techniques are widely describedin the literature. Other conventional mutagenesis treatments which maybe used to obtain enzymes according to the invention include random orregional random mutagenesis by chemical agents, such as N-nitrosocompounds, or physical agents, such as ultraviolet light, as well asrandom or regional random mutagenesis by polymerase chain reaction (PCR)methods. Regional random mutagenesis may be carried out by subcloningone or more relevant DNA sequences encoding segments of the startingprotein e.g. UDG, followed by random mutagenesis on this fragment orfragments. After the fragments have been mutagenized they may bereinserted into a DNA sequence encoding the starting protein e.g. UDG.Screening of individual colonies for novel DNA glycosylases of theinvention may then be performed using assay methods described herein.

Alternatively, the novel DNA glycosylases of the invention may beobtained by other techniques, for example polypeptide synthesis,construction of fusion proteins etc.

DNA glycosylase activity may readily be assayed according to techniqueswell known in the art, see for example Slupphaug et al. (1995)Biochemistry, 34: 128-138, and Nedderman & Jiricny, supra. Assays forDNA glycosylase may be used for identifying enzymes according to theinvention. The enzymes may be naturally occurring or formed as theresult of manipulations of naturally occurring gene sequences orproducts. Thus, for example, a cell-free extract may be assayed using athymine or cytosine-containing substrate to identify enzymes whichperform excision of one or more of the bases. For the purposes ofassessment, the cytosine and thymine bases in the substrates areconveniently labelled, for example fluorescent or radiolabelled e.g.with ³H. Suitable substrates may be prepared by methods known in the arte.g. by nick translation, random priming, PCR or chemical synthesis. Toascertain if the enzymes are also capable of excising uracil, substratesincluding uracil may also be used. Conveniently, the uracil bases shouldbe labelled to allow detection. Assays for the excision of differentbases are preferably performed independently.

Thus, viewed from a yet further aspect, the invention provides an assayfor the identification of DNA glycosylases of the invention in a sample,in which said assay comprises at least the step of assaying for activityin the sample which is capable of excising thymine or cytosine andoptionally also uracil from an introduced ssDNA and/or dsDNA substrate.Optionally, the moiety responsible for such activity may be isolated.Suitable assays are described herein and are also known in the art.

DNA glycosylases of the invention include modifications of human UDG byamino acid replacement, as mentioned above, especially at positions 204and 147. Such amino acid-substituted mutants of human UDG may alsocomprise additional modifications, for example truncation from the N-and/or C-terminal, or chemical derivation of amino acid residues and/oraddition, deletion or mutation of constituent residues which do notaffect the overall specificity of the enzyme.

Derivatives of UDG or other DNA glycosylase enzymes from other genera orspecies, having the CDG or TDG functional activity mentioned above, arealso included within the scope of the invention. It will be appreciatedthat appropriate modification of such enzymes would be performed oncomparable residues to those in the human enzyme which form part of theactive site and which could be identified by methods known in the art,e.g. by sequence comparison to human UDG and/or by mutation of residueswhich are identified as potentially conferring specificity to the enzymeand subsequent substrate specificity analyses of the mutant enzymes thusobtained.

The novel DNA glycosylases of the invention may have a number of uses,for example as tools in molecular biology procedures, most notably inmutagenesis, both in vitro and in vivo, but also in other areas such ascell killing, removal of contaminating DNA, random degradation of DNA,enzymatic DNA sequencing etc.

In light of the identification of mitochondrial and nuclear localizingpeptides, it is now possible to direct human uracil-DNA glycosylaseeither to nuclei or to mitochondria by making constructs containingeither a nuclear localization signal, such as in UNG2, or amitochondrial localization signal, such as in UNG1, as mentioned above.Whilst this alone may be used to mutate RNA in the cells, this isparticularly useful in combination with site directed mutations thatgive rise to mutants that have either TDG activity or CDG activitybecause it allows for selective mutagenesis of nuclear DNA ormitochondrial DNA. Furthermore, it is useful in a system where eithernuclear or mitochondrial DNA is the target for degradation for thepurpose of killing cells, eg. cancer cells.

As mentioned above, DNA glycosylases according to the invention may beused in a mutagenesis system both in vitro and in vivo. These proteinshave numerous advantages over typical chemical mutagens, particularlyregarding their ease of use. Small molecular mutagens, such asmethylnitrosurea (MNU), methylmethanesulfonate (MMS) ormethylnitrosoguanidine (MNNG) are very toxic on contact with eyes, skinor mucosal membranes and may decompose to explosive and volatile toxiccompounds. Other mutagens, such as dimethylnitrosamine andbenzo(a)pyrene require metabolic activation by special enzymes that areonly present in some cells. They can therefore only be used undercertain experimental conditions and will often require the addition of afraction containing activating enzymes. All these chemical mutagenstherefore require specialised precautions in order to protect the user.One major advantage of DNA glycosylases according to the invention isthat they are not volatile and are not harmful to the user, for example,by mere skin contact.

Mutagenesis in vitro may be performed on a complex sample, e.g. acell-free extract, a partially refined sample, e.g. nucleic-acidenriched or purified sample or on a single population of nucleic acidmaterial, e.g. amplified nucleic acid material. Random mutation may beperformed using selected DNA glycosylases of the invention (possibly incombination with one another and/or with known DNA glycosylases), torelease particular bases or combinations of bases from the nucleic acidsubstrate. Removal of the resulting abasic site and replacement of theremoved base with another base may be performed by provision ofappropriate enzymes and bases.

Specific mutagenesis may be performed in a number of ways. Depending onthe specificity of the DNA glycosylase for ssDNA or dsDNA, either one orthe other type of DNA may be targeted. One application of such a methodmay be to introduce labelled bases into the target DNA to identify itspresence or amount in the total nucleic acid material. Alternatively,the substrate which is uniquely recognizable (e.g. dsDNA) may be madesensitive to digestion or degradation after release of the appropriatebase by DNA glycosylase activity when replacement of the base has notbeen performed. This may then be used to remove certain ss-or ds-DNAfrom a sample. Such an application is discussed in more detailhereinafter.

Another application involves the introduction of selected bases afterrelease of the specific bases recognized by the DNA glycosylase. In thisway, replacement of specific bases by specific other bases may beperformed. It is known from the art that the human UDG has sequencespecificity for uracil excision in the sequence surrounding the uracilbase (Slupphaug et al., 1995, supra). Appropriate selection of enzymeconcentrations and other determinants may be employed to excise specificbases from known sequences or alternatively, by replacement withappropriately labelled bases, to determine the presence of suchsequences in nucleic acid samples.

For mutagenesis in vivo, e.g. in a cell, a nucleotide sequence encodinga DNA glycosylase according to the invention under the control of ansuitable expression vector may be introduced into the cell by anysuitable means, for example, by transformation or through the use ofliposomes.

A further aspect of the present invention thus provides a nucleic acidmolecule comprising a nucleotide sequence which encodes a DNAglycosylase and/or nuclear localizing peptide of the invention asdefined above. Such nucleic acid molecules may readily be prepared usingconventional techniques well known in the art. Thus, for example, asalready mentioned above, known gene sequences coding for DNAglycosylases, e.g. the UNG gene mentioned above, may be modified e.g. bynucleic acid substitution using standard techniques such assite-directed mutagenesis.

In further aspects the invention also provides an expression vectorcontaining a nucleic acid molecule of the invention, and transformed ortransfected host cells carrying a nucleic acid molecule of theinvention.

The expression vector may be any conventional expression vector known inthe art or described in the literature, including both phage and plasmidvectors. In general, these will comprise suitable regulatory sequencese.g. a promoter and/or enhancer operably connected to a gene expressingthe enzyme. Suitable promoters include SV40 early or late promoter, e.g.PSVL vector, cytomegalovirus (CMV) promoter and mouse mammary tumourvirus long terminal repeat, although preferably inducible promoters areused, e.g. mouse metallothionein I promoter. The vector preferablyincludes a suitable marker such as a gene for dihydrofolate reductase orglutamine synthetase. The expression vector may for example be aninducible vector, such as the E. Coli vector pTrc99A (See Slupphaug(supra)) inducible with isopropyl β-D-thiogalactopyranoside (IPTG).Other suitable expression vectors include any vector carrying aninducible promoter, such as lac, or bacteriophage lambda λP_(L), inwhich the promoter is under the control of a temperature sensitiverepressor (cI). Examples of such vectors are pKK223-2 and pP_(L)-LambdaInducible (from Pharmacia). The DNA glycosylases of the invention mayalso be expressed as fusion proteins. The expression of such fusionproteins may facilitate purification e.g. by using a system such as theGST-gene fusion systems, exemplified by the pGEX® vector systems(Pharmacia) or the fusion proteins with peptide sequences that arerecognized by specific antibodies, exemplified by the FLAG® Expressionvectors (Kodak).

The host cell may likewise be any suitable host cell known in the art,including both eukaryotic e.g. yeast, mammalian and plant cells, andprokaryotic cells, e.g. bacteria.

Transfection and transformation techniques are also well known in theart as described for example in Sambrook et al. (1989), MolecularCloning: A laboratory manual, 2nd Ed., Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y.) as indeed are other techniques forintroducing nucleic acids into cells, for example using calciumphosphate, DEAE dextran, polybrene, protoplast fusion, liposomes,electroporation, direct microinjection, gene cannon etc.

Expression of the DNA glycosylase according to the invention results inthe release of C or T from the cellular DNA, which may lead totransition mutations upon replication.

Mutagenesis of cells, e.g. mammalian cells, may also be performed byintroduction of the DNA glycosylase protein of the invention into thecell. This may be performed using for example liposomes or otherappropriate techniques known in the art.

TDG or CDG may also be used to specifically induce mutations either inthe cell nucleus or mitochondria of eukaryotic cells. This may becarried out by expressing cDNA with the complete open reading frame ofUNG2, but with a site directed mutation in codon 204 (preferablyAsn204Asp) or in codon 147 (preferably Tyr147Ala), in which theN-terminal amino acid sequence contains a nuclear localization signal,as described previously, to obtain mutations in the nuclear DNA, or byexpressing a cDNA expressing the complete reading frame of UNG1, inwhich the N-terminal amino acid sequence contains a mitochondriallocalization signal, as described previously, with similar site directedmutations to those mentioned above, to specifically obtain mitochondrialmutations. For this purpose any expression vector applicable toeukaryotic cells may be used, but preferably the vector system should beinducible. To introduce the expression vectors into the cells, anymethod for transfection my be used. Alternatively, the same proteins maybe expressed and purified and then introduced into the cells by liposometechnology or other appropriate techniques in the art as mentionedpreviously.

Combined in vitro/in vivo mutagenesis may also be performed. Forexample, an isolated restriction fragment of interest (or possibly thewhole plasmid) may be treated with limited amounts of cytosine-DNAglycosylase or thymine-DNA glycosylase. Subsequently, the treatedfragment may be reinserted into a vector and transformed into E. colicells (the cells may also be pre-treated with a DNA damaging agent toensure an error-prone SOS-repair). As a result of the mutagenicity ofAP-sites, this should yield random mutations. DNA glycosylases of theprior art were limited in their usefulness in mutagenesis due to theirability to achieve site-directed mutation only (see for exampleWO93/18175).

The Examples below describe the induction of mutations in bacterialcells by the expression within such cells of a DNA glycosylase ie. a CDGor TDG according to the present invention. Expression of the DNAglycosylases of the invention in the transformed cells causes anincrease in mutation frequencies. Similar results may be obtained withother cells. To enhance mutagenesis, strains may be used, including bothprokaryotic and eukaryotic strains, which are defective in the repair ofAP-sites or are otherwise hypermutatable e.g. bacterial mutants that aredefective in endonuclease IV or exonuclease III, or both, or othermutants that similarly enhance the yield of mutations.

Thus, the use of one or more DNA glycosylases according to the inventionin in vitro and/or in vivo mutagenesis systems provide yet furtheraspects of this invention.

Another use of DNA glycosylases of the invention involves DNAmodification. By treating any type of DNA (single or double-stranded) invitro with a DNA glycosylase according to the invention,naturally-occurring C or T will be released, thus leaving an apyrimidicsite (AP-site). Subsequent treatment of this DNA with alkaline solutionsor enzymes such as apurinic/apyrimidinic-site endonucleases(AP-endonucleases) recognising AP-sites will cause breaks in the DNA atthe AP-sites. This method may therefore be used for the random cleavageof DNA. The number of cleaved sites will depend on the amount of the DNAglycosylases according to the invention used, thus allowing the numberof AP-sites and hence breaks to be controlled. Uses of such methodsinclude the removal of possible contaminating DNA prior to PCRamplification and for the enzymatic sequencing of DNA. The randomcleavage of DNA can also be used for producing randomly fragmented DNAof defined size ranges for different purposes, for example for efficienthybridization of DNA, for preparing genomic libraries or for removal ofhigh-molecular weight viscous DNA.

One advantage of using a DNA glycosylase according to the invention insuch methods is that in contrast to nucleases, DNA glycosylases do notrequire divalent cations and this is advantageous when bufferscontaining divalent cations are not desirable. A further advantage isthat the DNA glycosylase may be inactivated by heating the reactionmixture to 80° C. for 15 minutes, thus eliminating or substantiallyreducing its activity.

Uracil-DNA glycosylase has previously been shown to be efficient inremoving contaminating DNA prior to PCR amplification (see for exampleEP-A-624643). This method has the disadvantage that only DNA containinguracil could be removed and meant that uracil-containing DNA had to beprepared using appropriate uracil-containing primers to obtain DNA whichcould be removed prior to amplification. One advantage of the DNAglycosylases according to the present invention is that they do not havethis requirement as any contaminating DNA would be likely to containcytosine or thymine bases. Thus, CDG and/or TDG according to theinvention may be added to a reaction mix and allowed to digestcontaminating DNA. After treatment the enzymes/s are inactivated priorto the addition of the DNA sample and amplification to avoid degradationof the template or product.

Thus a further aspect of the invention provides the use of one of moreDNA glycosylases according to the present invention for removingcontaminating DNA prior to PCR amplification. The use of one or more DNAglycosylases according to invention in DNA modification provides afurther aspect of the invention. The term “modification” as used hereinrefers to all forms of modifying or manipulating DNA, includingcleavage, base substitution or insertion etc.

A DNA glycosylase according to the present invention may also be used ina method for the killing of cells. A DNA glycosylase according to thepresent invention may be introduced into specific target cells by meansof known transformation techniques, liposomes, specific targetingsystems such as ligands that bind to specific receptors, or any othersuitable techniques. The DNA glycosylase may be expressed in atissue-specific manner by placing a tissue-specific promoter upstream ofthe DNA sequence encoding a DNA glycosylase according to the presentinvention. Examples of such tissue-specific promoters are well known andare for example found in genes for a number of liver specific proteinssuch as albumin, blood clotting factors and apolipoproteins; severalhormones, such as human growth hormone from the pituitary gland andinsulin from Langerhans islands in pancreas, as well as aromataseinvolved in the estrogen biosynthetic pathway; porphobilinogen deaminasewhich is the third enzyme in the heme biosynthetic pathway; glycoproteinIIb/IIIa which is expressed in maturing megakaryocytes; the Zeta subunitof T-cell antigen receptor (TCR) which is expressed in T-cells; CD14expressed in monocytes and macrophages; villin expressed in certainepithelial tissues and tyrosinase expressed in melanocytes andmelanomas. In some cases abnormal expression from tissue specificpromoters has been observed in tumour cells, and this may be exploitedby using constructs of novel DNA glycosylases and the relevant tissuespecific promoter.

When the DNA glycosylase is expressed it may fragment the DNA in thecell and therefore kill the cell. Specific cells may also be targetedthrough the use of promoters containing other control elements, forexample, promoters which are controlled in a cell-cycle or temporalmanner or those possessing regulatory elements responsive to internal orexternal factors, e.g. promoters activatable by specific inducers, e.g.the inducer IPTG, which induces the lac promoter or lac derivatives suchas trc, by certain metals (e.g. metallothionein promoter), by certainhormones such as dexamethasone, androgens (on for example the promoterof the gene for prostate specific antigen which is tissue specific),retinic acid and certain cytokines.

Conceivably, where enzymes of the invention exhibit specific substraterequirements in the sequence surrounding the base for excision, thisspecificity may be employed by appropriate low level expression of theDNA glycosylase such that only DNA with the specific sequence is madesusceptible to degradation.

Thus a further aspect of the invention provides a method of killingcells, comprising the steps of introducing a DNA glycosylase accordingto the present invention into a cell and expressing said DNA glycosylasein the cell to an extent which results in the killing of that cell.Preferably, the DNA glycosylase according to the present invention iscontained within an expression vector, most preferably, atissue-specific expression vector.

A further use of DNA glycosylases of the invention is for performingenzymatic DNA sequencing. This may be performed in a manner analogous tothe chemical sequencing method of Maxam and Gilbert (Maxam and Gilbert(1980) Methods in Enzymology, 65: 499). However, the Maxam-Gilbertprocedure involves the use of several very toxic chemicals, such asdimethylsulfate (DMA) and hydrazine (the latter is also explosive) anduse of the glycosylases of the invention present a considerableadvantage. Enzymatic sequencing may be performed for example byend-labelling the sample DNA fragment appropriately, for example with³²P, ³³P or ³⁵S. For identifying the positions of cytosines and thyminesin the DNA, the DNA is treated with limiting amounts of cytosine-DNAglycosylase and thymine-DNA glycosylase according to the invention,respectively. The resulting AP-sites are then cleaved, e.g. by alkalinesolution (pyridine) or by an AP-endonuclease. The resulting end-labelledfragments are subsequently separated e.g. by electrophoresis and theposition of fragments of varying length identified appropriately, e.g.by autoradiography. Ideally, the positions of adenines and guaninesshould be determined in the same way using adenine- or guanine-DNAglycosylases. At the present time such enzymes are not available.However, the E coli DNA repair enzymes Tag and AlkA recognize adeninealkylated in the 3-position (Tag, AlkA) and guanine alkylated in the3-position (AlkA). Thus, one way of determining the positions ofadenines and guanines may be after alkylation of DNA with DMS, followedby treatment with AlkA and Tag. Subsequent experimental procedure may beperformed as for determining the C and T positions.

Thus, a further aspect of the invention provides a method of performingenzymatic DNA sequencing to determine the position of cytosine and/orthymine bases by treating said DNA with at least one CDG and/or TDG ofthe invention.

The invention will now be described more specifically in the followingnon-limiting Examples with reference to the following drawings in which:

FIG. 1 comprises graphs showing in vitro excision of radiolabelledmaterial from double stranded (ds) or single stranded (ss)[³H]cytosine-labelled DNA substrate (C-substrate) and[³H]thymine-labelled DNA substrate (T-substrate) by human UDG-mutants(CDG: Panel A, TDG: Panel B). The data represent mean values from twoindependent experiments each in duplicate for each time point. Symbolsin panel B of FIG. 1 are as indicated in panel A;

FIG. 2 comprises graphs showing analysis of the radioactive excisionproducts of substrate DNA by UDG (panel A) and UDG mutants TDG (Panel B)and CDG (Panel C), performed by thin layer chromatography. U-substrateis indicated by stars (⋆), other symbols are as in FIG. 1. The migrationof unlabelled standards (the free bases uracil, cytosine or thymine) isindicated as rectangles, marked respectively U-marker, C-marker andT-marker over the relevant fraction numbers;

FIG. 3 shows a revised organisation of the human UNG gene. Therestriction maps with EcoRI, HindIII, SacI and XbaI are indicated. Exonsare shown as black boxes and are numbered by Roman numbers. Exon 1A is apreviously unrecognised exon. Interspersed repeats are indicated (−:Alu, *: MER, ♦: MIR, ★: position of a 300 bp TA dinucleotide repeat);

FIG. 4 shows the generation of human UNG1 and UNG2 by transcription fromtwo promoters and alternative splicing. P2 is the previously recognisedpromoter for transcription of UNG1 (Haug et al., 1994, FEBS Letters,353, p180-184) and P1 the promoter from which UNG2 is transcribed. Exon1A encodes 44 amino acids present in UNG2, but absent in UNG1. The 35N-terminal codons of exon 1B are only present in UNG1. The presequenceof UNG2 is shown on top with the putative nuclear localization signalunderlined. The presequence of UNG1 directing mitochondrial import isshown in the bottom line;

FIG. 5 shows the structure of the 5′-terminal part of the human UNG gene(SEQUENCE I.D. No. 7). Bold letters indicate exons (1A and 1B);

FIG. 6 shows the alignment of UNG proteins from man and mouse (SEQUENCEID Nos 8 (hUNG1), 9 (mUNG1), 2 (hUNG2) and 10 (mUNG2)). Note that UNG1and UNG2 proteins have been aligned separately down to the common splicecorresponding to codon 44 in human UNG2. The presequence not present inthe catalytically active form of human placental uracil-DNA glycosylaseoriginally isolated, residues 1-77 in human UNG1 (Wittwer et al., 1989,Biochemistry, 28, p780-784) is shown in bold letters. Downstream of thealternative splice site (↓) used for generating UNG2 forms (from 45 inhuman UNG2), the sequences of the two forms are identical in eachspecies. Residues that make up walls of the uracil-binding pocket orwhich are directly involved in catalysis are marked with a star (★).Residues that are involved in DNA-binding (except those involved inuracil-binding) are marked with a triangle (▴); and

FIG. 7 shows the subcellular localization in HeLa cells of UNG2-EGFP-N1and UNG1-EGFP-N1 fusion products. HeLa cells were transfected withconstructs expressing pUNG2-EGFB-N1 (C), pUNG1-EGFP-N1 (D) or thecontrol pEGFP-N1 (A), all expressed from the CMV promoter, and processedfor confocal microscopy. Panel B shows staining of mitochondria withTexas red.

EXAMPLE 1 Site Directed Mutagenesis of Human UDG Codons

Site directed mutagenesis was performed on the relevant codons in humanUDG and the proteins expressed in Escherichia coli.

Methods

Site-directed mutagenesis was carried out as in Mol et al., 1995, supra.To obtain the Tyr147Ala mutant, codon 147, TAT→Tyr, was changed toGCT→Ala, and to obtain the Asn204Asp mutant, codon 204, AAC→Asn, waschanged to GAC→Asp. Mutated DNA fragments were subcloned into human UDGexpression construct pTUNGΔ84 by replacing restriction fragments in theexpression construct by fragments containing the respective mutations.In Escherichia coli pTUNGΔ84 expresses high levels of a fully activehuman UDG (UNGΔ84) lacking 7 non-essential and non-conservedNH₂-terminal residues of the mature form of UDG (Mol et al., 1995,supra; Slupphaug et al., 1995, supra). Expression of mutant proteins inEscherichia coli and purification of the mutant proteins to apparenthomogeneity were carried out as described previously (Mol et al., 1995,supra; Slupphaug et al., 1995, supra). Relevant fractions were assayedfor DNA glycosylase activity during each step in the purification. As aresult of high expression, purification may also take advantage of theUV absorption of the enzymes. Peaks of UV absorption corresponding tothe enzyme of interest could already be observed after only the firsttwo column steps.

To test enzymatic substrate specificities, 250 ng purified human “wildtype” UDG (UNGΔ84), UNGΔ84Tyr147Ala or UNGΔ84Asn204Asp, were mixed with200 ng ds- or ss-[³H]cytosine-labelled DNA (150 mCi/mmol), or ds- orss-[³H]thymine-labelled DNA (100 mCi/mmol) in 10 mM NaCl, 20 mM Tris-HCl(pH 7.5), 1 mM EDTA, 1 mM dithiothreitol and 0.5 mg/ml bovine serumalbumin (final concentrations) in 20 μl separate reactions. The finalconcentrations of the [³H]cytosine-DNA (C) and [³H]thymine-DNA (T)substrates were 6.5 μM and 10 μM, respectively. Release of radioactivityas a function of time was measured at 37° C. These conditions are laterreferred to as standard conditions. Substrate synthesis and processingof samples for scintillation counting were as described in Krokan andWittwer (1981) Nucl. Acids Res., 9: 2599-2613. Single-stranded substratewas generated by boiling double-stranded substrate for 10 min, followedby rapid cooling on ice.

Results

FIG. 1 demonstrates time-dependent release of acid-soluble radioactivityby homogeneous UNGΔ84Asn204Asp (CDG) from [³H]cytosine-labelled DNA, butnot from [³H]thymine-labelled DNA. Conversely, the homogeneousUNGΔ84Tyr147Ala (TDG) mutant releases acid-soluble radiolabelledmaterial from [³H]thymine-labelled DNA, but not from[³H]cytosine-labelled DNA.

EXAMPLE 2 Analysig of the Radioactive Excision Products by Thin LayerCromatography Methods

The analysis was performed using DC-cellulose as the stationary phaseand methanol:HCl:H₂O—70:20:10 as the mobile phase. Samples were preparedas follows: 1.5 μg enzyme (UNGΔ84, UNGΔ84Tyr147Ala or UNGΔ84Asn204Asp aspreparedin Example 1) was incubatedwith 1 μg [³H]uracil-labelled DNA(500 mCi/mmol), [³H]cytosine-labelled DNA (150 mCi/mmol) or[³H]thymine-labelled DNA (100 mCi/mmol) in separate 50 μl reactionsunder standard buffer conditions (see Example 1) for 1 hour.Macromolecules in the samples were then ethanol precipitated, thesupernatants after centrifugation were collected, ethanol was removed byevaporation and the remaining material was dissolved in 10 μl H₂O . 1 μlwas spotted on the membrane. After migration the cellulose sheet was cutin strips and radioactivity measured by scintillation counting in ReadyProtein® scintillation cocktail.

Results

Separation of the acid-soluble radioactive material by thin layerchromatography (FIG. 2) demonstrated that the released material was thefree bases [³H]cytosine or [³H]thymine. Separation using another mobilephase (butanol:H₂O, 86:14) verified these results (data not shown). Inaddition, both mutants release [³H]uracil, whereas “wild type” UDG(UNGΔ84) releases [³H]uracil only (FIG. 2).

EXAMPLE 3 Substrate Specificity and Uracil Inhibition and KineticProperties of UDG-mutants Methods

For measuring release of uracil from double-stranded (U-ds) orsingle-stranded (U-ss) DNA the various mutant enzymes (prepared asdescribed in Example 1 and by analogous site-directed mutagenesismethods and identical expression and purification methods) wereincubated with 200 ng ds- or ss-[³H]dUMP-labelled DNA (500 mCi/mmol, 2μM final concentration) in 20 μl separate reactions for 10 min understandard conditions as described in Example 1. For measuring release of[³H]cytosine or [³H]thymine, assays were performed as described inExample 1 using an incubation time of 10 min. Uracil inhibition wasanalysed by adding 5 mM uracil (final concentration) to a standard U-dsassay. 0 activity indicates activity below detection limit (10 pmol permg protein per min) with 100 ng enzyme and 200 ng DNA substrate atstandard conditions.

The kinetic parameters were determined using six different substrateconcentrations to obtain the K_(m) and V_(max) values. Duplicate sampleswere incubated for 20 min using standard assay buffer conditions andsubstrates as specified. K_(m) and V_(max) were calculated using thecomputer program Enzpack, version 3.0 after the method of Wilkinson(1961) Biochem. J., 80: 324-332. K_(cat) was calculated from V_(max)assuming an M_(r)=25000.

Results

The results are shown in Tables 1 and 2. From Table 1 it can be seenthat only the substitution Tyr147Ala results in an enzyme whichspecifically excises thymine. Similarly, only the substitution Asn204Aspresults in a mutant which excises cytosine. Both mutant enzymes exhibitactivity on single or double-stranded DNA and are also able to exciseuracil. From Table 2 it can be seen that the turnover numbers of CDG andTDG are lower than for “wild type” release of uracil.

Discussion

These results demonstrate the significance of Asn204 for specificbinding of uracil-containing DNA and the significance of Tyr147 sidechain ring structure for preventing binding of thymine.

It is somewhat surprising that the novel CDG of the invention stillrecognizes uracil, considering the unfavourable proximity of the Aspcarboxyl side chain and the O4 atom in uracil. However, it should benoted that the other oxygen atom of the Asp204 carboxyl side chain stillmay form H-bonds with N3 uracil and that Asp145 main chain carbonyl aswell as the amide-N of Asp145 and Gln144 also contribute to thespecificity. In addition, the UDG activity remaining is very low(0.04-0.16%) compared with “wild type”. CDG has a 10-fold increasedpreference for single stranded substrate, whereas TDG has a decreasedpreference (FIG. 1 and Table 1).

It is evident that the turnover numbers (Keat) of the novel enzymesreleasing either cytosine or thymine, as well as residual UDGactivities, are very low when compared with release of uracil by “wildtype” UDG (Table 2). However, the very high turnover number of UDGappears to be unique among DNA glycosylases and turnover numbers ofother DNA glycosylases may be as low, or even lower than those of theengineered glycosylases CDG and TDG. Thus, a recent biochemicalcharacterisation of recombinant N-methylpurine-DNA glycosylase frommouse gave K_(cat) values of 0.8 min⁻¹ and 0.2 min for excision of3-methyladenine and 7-methylguanine respectively (Roy et al. (1994)Biochemistry, 33: 15131-15140).

The Escherichia coli inducible 3-methyladenine-DNA glycosylase II (AlkA)is a DNA glycosylase that recognizes at least 6 different damaged bases,among these structurally different alkylated purines and pyrimidines.The turnover number for AlkA on the substrate 3-methyladenine-DNA iscalculated to be 0.03 min⁻¹ (Bjelland et al. (1994) J. Biol. Chem., 269:30489-30495).

The FPG protein (formamido-pyrimidine-DNA glycosylase) also has a ratherlow turnover number. The K_(cat) value on the imidazole ring-opened formof 7-methylguanine-DNA substrate is calculated to 1.4 min⁻¹ (Boiteux etal. (1990) J. Biol. Chem., 265: 3916-3922). A low rate of catalysis isalso likely for the naturally occurring T(U)/G-mismatch-DNA glycosylasesince band shifts can be demonstrated after mixing the enzyme withsubstrate (Sassanfar & Roberts (1990) J. Mol. Biol., 212: 79-96).

All of these DNA glycosylases recognize at least two differentsubstrates, and in most cases several damaged pyrimidines or purines.Probably the very high turnover number of UDG reflects a highselectivity of substrate binding in a tight fitting active site,allowing rapid catalysis by this specialized enzyme. In contrast, theDNA glycosylases with a broader substrate specificity may bind substrateless accurately, and excise the base, more slowly.

TABLE 1 Inhibition pmol excised per min per mg protein 5 mM UracilMutant U-ds U-ss C-ds C-ss T-ds T-ss % <<Wild type>> 4.7 × 10⁷ 9.5 × 10⁷0 0 0 0 80 Gln144Leu 3.4 × 10⁴ 4.8 × 10⁴ 0 0 0 0 25 Asp145Glu 5.5 × 10⁴8.5 × 10⁴ 0 0 0 0 80 Asp145Asn 1.4 × 10⁴ 1.1 × 10⁴ 0 0 0 0 80 Tyr147Ala2.2 × 10⁴ 2.2 × 10⁴ 0 0 1.3 × 10³ 7.5 × 10² 0 Tyr147Phe 3.2 × 10⁷ 6.3 ×10⁷ 0 0 0 0 50 Ser169Ala 3.1 × 10⁶ 5.6 × 10⁶ 0 0 0 0 80 Asn204Asp 1.7 ×10⁴ 1.6 × 10⁵ 3.0 × 10² 3.0 × 103 0 0 0 Asn204Gln 1.5 × 10⁶ 2.2 × 10⁶ 00 0 0 70 His268Leu 1.3 × 10⁵ 2.6 × 10⁵ 0 0 0 0 75

TABLE 2 Substrate C-ds C-ss T-ds T-ss U-ds U-ss K_(m) K_(cat) K_(m)K_(cat) K_(m) K_(cat) K_(m) K_(cat) K_(m) K_(cat) K_(m) K_(cat) Mutant(μM) (min⁻¹) (μM) (min⁻¹) (μM) (min⁻¹) (μM) (min⁻¹) (μM) (min⁻¹) (μM)(min⁻¹) <<Wild type>> — — — — — — — — 0.10 2500 0.06 5150 Tyr147Ala 6.00.06 1.4 0.02 3.5 1.0 0.30 0.6 Tyr147Phe — — — — — — — — 0.16 1225 0.102370 Asn204Asp 35 0.12 5.3 0.39 — — — — 2.4 1.2 2.0 15 Asn204Gln — — — —— — — — 0.40 66 0.23 89

EXAMPLE 4 Effects of TDG and CDG Activity on Frequency of RifampicinResistant Mutations in E. coliu ung⁺ Strain (NR8051) and E. coli ung⁻Strain (NR8052) Methods

An overnight culture of E. coli strains NR8051 and NR8052 (both recA⁺,provided by Tomas Kunkel of National Institute of Environmental Health,USA) containing plasmids pTrc99A, pTUNGΔ84, UNGΔ84Tyr147Ala andUNGΔ84Asn204Asp were prepared as described in Example 1 and grown inLB-medium with ampicillin (100 μg/ml) at 30° C. The culture was thendiluted 1:20 in fresh medium and cultured for 5 hours at 37° C. Inducedculture contained 1 mM IPTG in the LB-medium. To determine the number ofrifampicin resistant bacteria, 100 μl of the culture were mixed with 3ml top agarose and poured on LB plates containing 100 μg/ml rifampicinand incubated overnight at 37° C. Colonies were counted and the numberof rifampicin resistant colonies per 10⁸ viable cells was calculated.

Results

The results are shown in Table 3. These results indicate that theexpression of UDG does not cause an increase in mutation frequencies(plasmid pTUNGΔ84 compared to parental pTrc99A). In fact, human UDGcomplements E. coli ung⁻ cells. This is clear from the reduction inmutation frequencies from 4.4 to 1.3 when UDG is present in inducedcells. Uninduced cells are also protected as a result of promoterleakage. In contrast, the mutation frequencies of E. coli ung⁺ cells areincreased by a factor of 8.6 and 39 when carrying plasmids encoding CDGand TDG, respectively, compared to host cells carrying the parentalplasmid pTrc99A. This increases to approximately 8.9 and 94.4respectively, in induced ung⁺ cells.

Discuscion

Single amino acid substitutions transform the highly uracil-selectiveuracil-DNA glycosylase into less selective DNA glycosylases that attacknormal pyrimidines and confer a mutator phenotype upon the cell,presumably because excess numbers of apyrimidinic-sites are formed.

It may seem surprising that propagation of plasmids expressing CDG orTDG activity is at all possible, since they might be expected to killthe host cells. We believe that the relatively low turnover numbers andthe low expression in the absence of inducer (IPTG) is sufficient toreduce the number of depyrimidinations to a level that the DNA repairsystem can cope with. Nevertheless, DNA degradation is detectable evenin the absence of inducer and is strongly increased when IPTG is added(data not shown). The survival of Escherichia coli recA⁺ host cellscarrying uninduced CDG or TDG-plasmid is equal to that of the parentalcell carrying plasmid pTrc99A (data not shown) although mutationfrequencies are increased by a factor of 8.6 and 39 for CDG and TDG,respectively as mentioned above.

Induction of CDG or TDG by IPTG reduces survival of the Escherichia colihost cells (in both NR8051 and NR8052) to less than 50% and 10%,respectively, within 5 hrs. Thus, AP-site repair capacity is sufficientfor repair of damage caused by expression of CDG or TDG due to “leakage”from the uninduced promoter. However, this repair is apparently notcomplete, or may be inaccurate, since the frequency of mutations leadingto rifampicin resistance is significantly increased by induction withIPTG (Table 3). The activity of TDG in vivo leads to a 10-fold highermutation frequency in Escherichia coli than the in vivo CDG activity.This probably reflects the fact that TDG has a higher activity on dsDNAthan CDG, as demonstrated by in vitro experiments with homogeneousenzyme (FIG. 1), and that the K_(m) value for TDG on dsDNA is much lowerthan the K_(m) for CDG on dsDNA (Table 2). We have observed that TDG andCDG are both highly cytotoxic in.a recA⁻ background (Escherichia coliDH5α) even without induction (data not shown). It is likely that thiscytotoxic effect is due to a lack of SOS-induction in recA⁻ cells. Thechemical nature of the SOS-inducing signal, or signals, is not fullyknown, and some DNA lesions may indirectly activate the SOS response byinterfering with DNA replication (Sassanfar & Roberts, 1990, supra). Ifgeneration of AP-sites by TDG and CDG directly or indirectly triggersSOS-induction, this would increase cell survival, at the cost of errorprone repair and a high yield of mutations. CDG and TDG should be veryuseful for exploring the biological consequences of AP-sites in DNA.

The new DNA glycosylases that we have engineered are distinctlydifferent from previously known glycosylases. The mismatch-specificthymine-DNA glycosylase previously reported also releases uracil(Sassanfar & Roberts, 1990, supra; Nedderman & Jiricny, 1993, supra),like the thymine-DNA glycosylase we have constructed. However while thenaturally occurring thymine-DNA glycosylase has an absolute requirementfor a mismatched U or T opposite of a G, the TDG we have engineeredrecognises T or U from T(U):A matches, as well as from single strandedsubstrate. A DNA glycosylase recognizing unmodified cytosine hadpreviously not been reported.

The mutator phenotype caused by a single amino acid substitution isintriguing since it changes an enzyme from its normal role in mutationavoidance into a cytotoxic mutator protein. In the case of CDG thischange is the result of a single A→G transition, which in vivo could bethe result of several different events, such as deamination of A,O4-alkylation of T in the complementary strand, and replication errors.Since this mutation would be dominant, only one allele would need to bemutated to get a new phenotype. It is possible, however, that thismutation would be lethal, or that it would be without seriousconsequences due to efficient repair of DNA in mammalian cells.Nevertheless, the generation of repair enzymes having a dominant mutatoreffect that would give the cells a hypermutable phenotype may representa new principle in mutagenesis.

TABLE 3 Frequency of rif^(R) mutations per 10⁸ cells NR8051 NR8052Plasmid Uninduced Induced Uninduced Induced pTRC99A 0.8 ± 0.3 0.9 ± 0.54.2 ± 1.2 4.4 ± 1.1 pTUNGΔ84 0.7 ± 0.4 0.8 ± 0.3 0.9 ± 0.2 1.3 ± 0.4pTUNGΔ84 31 ± 8  85 ± 32 13 ± 5  57 ± 9  Tyr147Ala pTUNGΔ84 6.9 ± 4.18.0 ± 3.4 2.1 ± 0.7 4.1 ± 1.1 Asn204Asp

EXAMPLE 5 Effects of TDG Activity on the Frequency of RifampicinResistant Mutations in E. coli Strains BW527, and GW2100 (umuC⁻)

An overnight culture of E. coli strains BW527 (endoIV⁻) or GW2100(umuC⁻) provided by Erling Seeberg, The National Hospital, Oslo,containing plasmids pTrc99A or UNGΔ84Tyr147Ala (TDG) were prepared asdescribed in Example 1 and grown in LB-medium with ampicillin (100μg/ml) at 30° C. The culture was then diluted 1:20 in fresh medium andcultured as described in Example 4.

Results

The results are shown in Table 4. These results indicate that theexpression of UNGΔ84Tyr147Ala (TDG) in E. coli strains BW527 (endoIV⁻)or GW2100 (umuC⁻) enhances the mutagenic effect of TDG compared tostrains that do not carry these defects in the repair of AP-sites ordefect in umuC especially after induction with IPTG. pTrc99A alone didnot exert this effect to any significant extent. Even more importantly,the background mutations in these strains are low and the effects ofinduction with IPTG is high, thus improving the usefulness ofUNGΔA84Tyr147Ala (TDG) for mutagenesis when using more optimal strains.

These results are especially surprising in light of previous findingsthat mutants in umuC are generally difficult to mutate by some methods,for example by UV-light or by chemical challenge.

TABLE 4 Effects of TDG-activity on frequency of rifampicin resistantmutations in E. coli strains BW527 and GW2100 Frequency of rif^(R)mutations per 10⁸ cells BW527 GW2100 Plasmid Uninduced Induced UninducedInduced pTRC99A 0.06 ± 0.02 0.07 ± 0.03 0.24 × 10⁻³ ±  6 × 10⁻³ ±  0.1 ×10⁻³  5 × 10⁻³ pTUNGΔ84 1.20 ± 0.2  240 ± 122 0.65 ± 3829 84 ± 28Tyr147Ala

EXAMPLE 6 Isolation and Characterisation of a Nuclear Form of Uracil-DNAglycosylase Materials and Methods Materials

Mouse embryonic carcinoma cDNA library, human liver cDNA library and NT2neuronal precursor cell cDNA library were from Stratagene (La Jolla,Calif., USA). All libraries were propagated in the Uni-ZAP® XR vectorusing XL-1 blue as host. [α-³²P]dCTP, [³⁵S]methionine, Rediprime® randomlabelling kit and HYBOND® N+filters were all from Amersham (UK). Allsequencing primers were from MedProbe (Oslo, Norway). Dye terminatorcycle sequencing ready reaction kit was from Applied Biosystems (FosterCity, Calif.). The Dynazyme® PCR kit was purchased from Finnzymes Oy(Espoo, Finland). TNT® in vitro transcription/translation rabbitreticulocyte lysate system kit, pGEM®-T TA cloning kit, Altered Sites®II in vitro Mutagenesis System, primers for sequencing from T3 and T7promoters and T3 RNA polymerase were from Promega (Madison, Wis.). Theplasmid encoding the red-shifted variant of green fluorescent protein(pEGFP-N1) was from Clontech (Palo Alto, Calif., USA). Restrictionenzymes were from New England Biolabs Inc. (Beverly, Mass., USA).

Screening of cDNA Libraries

All libraries were screened as recommended by the manufacturer, using³²P-labelled UNG40 cDNA (Olsen et al., 1989, EMBO J., 8, p 3121-3125) asprobe. Hybridization was carried out at 65° C. overnight in 6×SSC,5×Denhardt's solution and 0.1% SDS. Filters were washed in 0.1×SSC/0.5 %SDS at 65° C. and autoradiographed. Three rounds of screening were done.In vivo excision of pBluescript® phagemids from the Uni-ZAP® XR vectorwas performed as recommended by the manufacturer.

Sequence Analysis of Clones

Sequencing was performed on an Applied Biosystems Model 373A DNASequencing System using the Dye terminator cycle sequencing readyreaction kit as recommended by the manufacturer. The sequences wereanalysed using the Auto Assembler software (Applied Biosystems).

In Vitro Transcription, Uracil-DNA Glycosylase Assays and TransientTransfection of HeLa Cells for Promoter Studies

In vitro transcription/translation was performed with the TNT®transcription/translation system with [³⁵]methionine as recommended bythe manufacturer, using 200 ng of the expression constructs per 10 μlreaction volume. The mouse UNG1-pBluescript® construct was transcribedfrom the T3 promoter in the pBluescript® vector. The insert of mouseUNG2-pBluescript® was amplified by the polymerase chain reaction usingDynazyme® PCR kit, ligated into the pGEM-T vector and transcribed fromthe T7 promoter. The human UNG2-pBluescript was transcribed from the T3promoter after SacII/NheI excision of a 79 bp fragment from thepolylinker and the 5′-end of cDNA for UNG2. Human UNG1 CDNA wastranscribed from the T7 promoter as previously described (Slupphaug etal., 1995, Biochemistry, 34, p128-138). The samples were run on a 12%denaturing sodium dodecyl sulfate polyacrylamide gel (SDS-PAGE). The gelwas dried, autoradiographed overnight and scanned on an LKB Ultroscan XLEnhanced Laser Densitometer. Uracil-DNA glycosylase activity wasmeasured in parallel samples of the in vitro transcription/translationassay mixture containing unlabelled amino acids (Slupphaug et al., 1995,supra). A construct containing both promoters (pGL2-ProAB) linked to theluciferase gene was prepared by insertion of a PvuII/MluI fragment (theenzymes cleave in positions 418 and 1035, respectively) from thepromoter region of the UNG gene into the SmaI-MluI sites of pGL2-ProB. Apromoter II-luciferase construct (pGL2-ProB) and transient transfectionwith Transfectam® (Promega) have been described previously (Haug et al.,1994, FEBS Letters, 353, p 180-184).

Preparation-of pUNG-EGFP-N1 Fusion constructs and Localization Studies

UNG15 cDNA, which encodes UNG1, in pGEM7Zf+ (pUNG15), (Slupphaug et al.,1995, supra; Olsen et al., 1989, supra) was digested with BclI, whichcuts at bp 1019 in UNG15 cDNA, blunted with DNA polymerase I, (Klenowfragment), and ligated to an AgeI linker prepared from theoligonucleotide 5′-ACCGGTGCC-3′ and its complementary copy. Thereligated pUNG15 containing the AgeI linker correctly ligated into theBclI site (verified by sequencing) was digested with RsrII, which cutsat bp 49 in UNG15 cDNA (Olsen et al., 1989, supra), blunted as above andfinally digested with AgeI. The fragment was then ligated into pEGFP-N1digested with SmaI (blunt) and AgeI. The construct was sequenced toverify that the construct was in frame with the ATG of the EGFP-N1fusion protein. The TGA stop codon of pUNG15 was changed to GGA bysite-directed mutagenesis performed according to the procedure providedby the manufacturer using ssDNA prepared with R408 phage. PotentialpUNG1_(GGA)-EGFP-N1 constructs were screened by digestion with BclI(digests only unmutated plasmids) and verified by sequencing. Thecorrect construct was named pUNG1-EGFP-N1. cDNA for UNG2 (this example)in pBluescript was digested with NheI, which cuts 54 bp upstream of ATG,and EcoNI which cleaves the cDNAs in the sequence that is shared bycDNAs for UNG1 and UNG2 (positions 529 and 520, respectively). Theresulting fragment of interest (501 bp) was isolated and ligated to the5155 bp fragment of NheI/EcoNI-digested pUNG1-EGFP-N1 to obtainpUNG2-EGFP-N1. Transient transfections of HeLa cells were done with theCaPO₄-method (Profection®, Promega) according to the manufacturer'srecommendations. Confocal microscopy (BioRad MRC-600) of HeLa cells andstaining of mitochondria with mouse anti human mitochondria antibody(MAB 1273, Chemicon) and Texas Red anti-mouse IgG (Vector) wereperformed as previously described (Nagelhus et al., 1995, Exptl. CellRes., 220, p 292-297). Examination of HeLa cells transfected withexpression plasmids pEGFP-N1, pUNG1-EGFP-N1 or pUNG2-EGFP-N1 was carriedout using an excitation wave length of 488 nm and emission wavelength >515 nm at 16 hours after transfection.

Results

A human NT2 neuronal precursor cell cDNA library and a mouse embryoniccarcinoma cDNA library were screened and a new form of human uracil-DNAglycosylase (human UNG2) encoded by the UNG gene, as well as thehomologous cDNA from mouse (mouse UNG2) was identified. In addition thecDNA for the mouse homolog (encoding mouse UNG1) of human UNG1 (Olsen etal., 1989, supra) was identified. cDNA for human UNG2 has an ORFencoding 44 N-terminal amino acids not found in human UNGI whereas cDNAfor human UNG1 has an ORF encoding 35 amino acids not found in humanUNG2 (FIG. 4). The two forms are identical in the rest of the amino acidpresequence, which is not required for enzyme activity, as well as inthe catalytic domain, altogether 269 identical consecutive amino acids.The sequence of the 269 amino acids common to UNG1 and UNG2, and thecorresponding DNA sequence, is identical to amino acid residues 35-304in Olsen et al., 1989, supra. cDNAs for human UNG2 and its mousehomolog, are apparently as abundant as UNG1 in cDNA libraries fromproliferating cells since among 20 cDNA clones that were sequenced 10were of the UNG2 type and 10 were similar to the previously known UNG1type. Among 4 mouse cDNAs sequences, 3 were of the UNG2 type and 1 wasof the UNG1 type. However, screeing of a human hepatocyte library withUNG40 cDNA resulted in the isolation of 80 strongly bybridizing clonesand sequencing of 14 of these demonstrated that they were all similar tothe previously characterized cDNA for UNG1 or the CDNA UNG40 (Olsen etal., 1989, supra).

Comparison of the human cDNA for UNG2 with the recently publishedcomplete human UNG sequence (Haug et al., 1996, Genomics, 36, p408-416)revealed the presence of a previously unrecognised exon (exon 1A)located some 650 base pairs upstream of the previously identified exon 1(hereinafter called exon 1B). A revised organization of the UNG gene istherefore presented in FIG. 3. Exon 1B forms the leader sequence andcodon 1-104 of the mRNA enbcoding the previously known form UNG1. ThemRNA corresponding to the new human cDNA is formed by joining exon 1A(encoding 44 amino acids) into a consensus splice site after codon 35 inexon 1B after which the two human cDNAs are identical. The open readingframe of human UNG2 cDNA predicts a protein of 313 amino acids, ascompared to 304 amino acids for UNG1. Genomic clones for the mousehomolog of the UNG gene have also been isolated and sequenced.

This has revealed that the splice sites for exons 3, 4, 5 and 6 in theUNG genes from mouse and man are, in identical positions. Furthermore,PCR analyses have demonstated that the rest of the mouse gene isstructurally similar to the human gene, as expected from the cDNA clones(data not shown).

FIG. 4 shows how the alternative forms of mRNA for UNG1 and UNG2 ariseas deduced from human cDNAs and the corresponding UNG sequences andindicates the presence of a putative nuclear localization signal of 4basic residues (RKRH) in the N-terminal end of the new cDNA and putativemitochondrial localization signals in cDNA for UNG1. In addition, andnow shown here, both human cDNAs contain a putative nuclear localizationsignal (RKRHH) in the catalytic domain (residues 258-262 in the ORF ofcDNA for UNG1). These residues are located at the surface of the enzymebetween α-helix 7 and β-strand 4 (Mol et al., 1995, Cell, 80, p869-878).

FIG. 5 shows the genomic structure of exons 1A and 1B, as well as thestructure of the previously characterized promoter (hereinafter calledpromoter II), possible elements in the putative promoter upstream ofexon 1A (hereinafter called promoter I) and the alternative spliceacceptor site (SEQUENCE I.D. No. 7). Promoter I probably starts afterthe 3′-terminal end of two Alu-repeats (position 425) and endsimmediately upstream of the start of exon 1A. However, it can not beexcluded that the promoter is located upstream of the Alu-repeats. Thiswould require the presence of an exon encoding a leader that would bejoined to exon 1A. This is considered unlikely since promoter motifsupstream of the Alu-repeats have not been detected and furthermoretranscripts of the required size have also not been detected by Northernanalyses (data not shown). Furthermore, the cDNA for UNG2 does notcontain sequences from this upstream region.

FIG. 6 shows an alignment of predicted amino acid presequence of thehuman and mouse enzymes (SEQUENCE I.D. Nos 2 and 8-10). Note that UNG1proteins and UNG2 proteins have been aligned separately in the parts ofthe proteins that are derived from different exons (up to codon 45 inhuman UNG2). Table 4 shows the % of identical residues in the differentforms, using human UNG2 as the reference (100%). The parts of theprotein that are not required for catalytic activity are less wellconserved than the catalytic domain. Amino acids that have been found tobe critical for catalytic activity or formation of the uracil-bindingpocket (Mol et al., 1995, supra; Kavli et al., 1996, EMBO J., 15,p3442-3447) or DNA binding are completely conserved in mouse (residuesQ144, D145, P146, Y147, F158, S169, N204, S247, H268, S270, L272, S273,Y275 and R276 in UNG1).

To compare the promoter activity of promoter I alone and promoter I andpromoter II in combination, promoter-luciferase gene constructs wereprepared and transient transfection experiments performed with HeLacells. These studies verified the promoter activity of promoter II alone(Haug et al., 1994, supra) and further demonstrated that when bothpromoters are present in the construct, the luciferase activityincreased some 50%, indicating that promoter I is also active in HeLacells, as expected from the abundance of the new cDNA in proliferatingcells (Table 5).

Coupled transcription-translation of the two forms of human and mouseCDNA resulted in easily measurable uracil-DNA glycosylase activity forboth forms from mouse and man. For calculations of the relative specificactivities, the radioactivity released in uracil-DNA glycosylase assayswas compared to band intensities on an SDS-PAGE gel fromtranscription/translation reactions using [³⁵S] methionine (Table 6).

To examine whether human UNG1 and UNG2 were translocated to differentsubcellular compartments, constructs expressing fusion proteins of theUNG proteins and a red shifted variant of green fluorescent protein(EGFP-N1) were prepared. These were used for transient transfectionexperiments with HeLa cells. The major advantage of the greenfluorescent protein (over the use of antibodies) is that this methodrelies on the autofluorescence of this protein alone, and thus possiblecross reaction of the antibody with epitopes in irrelevant proteins isnot a problem. The control (pEGFP-N1) shows that the green fluorescentprotein displays a homogeneous staining over the cells (FIG. 7A). Incontrast, the UNG2-EGFP-N1 fusion protein is exclusively located in thenuclei (FIG. 7C) and the UNG1-EGFP-N1 fusion protein (FIG. 7D) ismainly, if not exclusively, located in extranuclear spots that have thesame appearance as mitochondria stained with Texas red (FIG. 7B). Theseresults provide convincing experimental evidence that UNG2 is a nuclearprotein and UNG1 a mitochondrial protein.

TABLE 4 Conservation of amino acids in four homologs of uracil-DNAglycosylase calculated as % identity with human UNG2* % identity ofdomains Variant Common Catalytic Overall presequence# presequence domainidentity (1-44) (45-63) (64-313) (1-313) hUNG1 2 100 100 90 mUNG2 64 7591 86 mUNG1 2 75 91 79 *The identity is calculated for the domains inUNG2 compared with the corresponding domains in the other forms. #Theidentity of the presequences of hUNG1 and mUNG1 is 27% with 82% identityoverall.

TABLE 5 Promoter activites in the UNG gene* Promoter - reporterLuciferase gene construct activity (%) pGL2-Basic 0.8 ± 0.4 pGL2-ProB100 ± 8  pGL2-ProAB 156 ± 4  *The promoter activity of pGL2-ProB(promoter II) was arbitrarily set to 100%. pGL2-Basic is a controllacking promoter.

TABLE 6 Relative specific activities of different forms of UNG aftertranslation in rabbit reticulocyte lysates* Area Activity Protein dpm*(mm²) (dpm/area) human UNG1 1291 0.054 23907 human UNG2 6360 0.268 23731human UNG1 921 0.061 15098 mouse UNG2 856 0.051 16784 *Relative specificactivites were calculated from measured dpm-values (³H-uracil releasedin uracil-DNA glycosylase assays) and areas under the curve of scannedbands on SDS-PAGE gels after subtraction of background values of 123dpm.

10 1 2053 DNA Homo sapiens CDS (71)..(1009) 1 cacagccaca gccagggctagcctcgccgg ttcccgggtg gcgcgcgttc gctgcctcct 60 cagctccagg atg atc ggccag aag acg ctc tac tcc ttt ttc tcc ccc 109 Met Ile Gly Gln Lys Thr LeuTyr Ser Phe Phe Ser Pro 1 5 10 agc ccc gcc agg aag cga cac gcc ccc agcccc gag ccg gcc gtc cag 157 Ser Pro Ala Arg Lys Arg His Ala Pro Ser ProGlu Pro Ala Val Gln 15 20 25 ggg acc ggc gtg gct ggg gtg cct gag gaa agcgga gat gcg gcg gcc 205 Gly Thr Gly Val Ala Gly Val Pro Glu Glu Ser GlyAsp Ala Ala Ala 30 35 40 45 atc cca gcc aag aag gcc ccg gct ggg cag gaggag cct ggg acg ccg 253 Ile Pro Ala Lys Lys Ala Pro Ala Gly Gln Glu GluPro Gly Thr Pro 50 55 60 ccc tcc tcg ccg ctg agt gcc gag cag ttg gac cggatc cag agg aac 301 Pro Ser Ser Pro Leu Ser Ala Glu Gln Leu Asp Arg IleGln Arg Asn 65 70 75 aag gcc gcg gcc ctg ctc aga ctc gcg gcc cgc aac gtgccc gtg ggc 349 Lys Ala Ala Ala Leu Leu Arg Leu Ala Ala Arg Asn Val ProVal Gly 80 85 90 ttt gga gag agc tgg aag aag cac ctc agc ggg gag ttc gggaaa ccg 397 Phe Gly Glu Ser Trp Lys Lys His Leu Ser Gly Glu Phe Gly LysPro 95 100 105 tat ttt atc aag cta atg gga ttt gtt gca gaa gaa aga aagcat tac 445 Tyr Phe Ile Lys Leu Met Gly Phe Val Ala Glu Glu Arg Lys HisTyr 110 115 120 125 act gtt tat cca ccc cca cac caa gtc ttc acc tgg acccag atg tgt 493 Thr Val Tyr Pro Pro Pro His Gln Val Phe Thr Trp Thr GlnMet Cys 130 135 140 gac ata aaa gat gtg aag gtt gtc atc ctg gga cag gatcca tat cat 541 Asp Ile Lys Asp Val Lys Val Val Ile Leu Gly Gln Asp ProTyr His 145 150 155 gga cct aat caa gct cac ggg ctc tgc ttt agt gtt caaagg cct gtt 589 Gly Pro Asn Gln Ala His Gly Leu Cys Phe Ser Val Gln ArgPro Val 160 165 170 ccg cct ccg ccc agt ttg gag aac att tat aaa gag ttgtct aca gac 637 Pro Pro Pro Pro Ser Leu Glu Asn Ile Tyr Lys Glu Leu SerThr Asp 175 180 185 ata gag gat ttt gtt cat cct ggc cat gga gat tta tctggg tgg gcc 685 Ile Glu Asp Phe Val His Pro Gly His Gly Asp Leu Ser GlyTrp Ala 190 195 200 205 aag caa ggt gtt ctc ctt ctc aac gct gtc ctc acggtt cgt gcc cat 733 Lys Gln Gly Val Leu Leu Leu Asn Ala Val Leu Thr ValArg Ala His 210 215 220 caa gcc aac tct cat aag gag cga ggc tgg gag cagttc act gat gca 781 Gln Ala Asn Ser His Lys Glu Arg Gly Trp Glu Gln PheThr Asp Ala 225 230 235 gtt gtg tcc tgg cta aat cag aac tcg aat ggc cttgtt ttc ttg ctc 829 Val Val Ser Trp Leu Asn Gln Asn Ser Asn Gly Leu ValPhe Leu Leu 240 245 250 tgg ggc tct tat gct cag aag aag ggc agt gcc attgat agg aag cgg 877 Trp Gly Ser Tyr Ala Gln Lys Lys Gly Ser Ala Ile AspArg Lys Arg 255 260 265 cac cat gta cta cag acg gct cat ccc tcc cct ttgtca gtg tat aga 925 His His Val Leu Gln Thr Ala His Pro Ser Pro Leu SerVal Tyr Arg 270 275 280 285 ggg ttc ttt gga tgt aga cac ttt tca aag accaat gag ctg ctg cag 973 Gly Phe Phe Gly Cys Arg His Phe Ser Lys Thr AsnGlu Leu Leu Gln 290 295 300 aag tct ggc aag aag ccc att gac tgg aag gagctg tgatcatcag 1019 Lys Ser Gly Lys Lys Pro Ile Asp Trp Lys Glu Leu 305310 ctgaggggtg gcctttgaga agctgctgtt aacgtatttg ccagttacga agttccactg1079 aaaattttcc tattaattct taagtactct gcataagggg gaaaagcttc cagaaagcag1139 ccatgaacca ggctgtccag gaatggcagc tgtatccaac cacaaacaac aaaggctacc1199 ctttgaccaa atgtctttct ctgcaacatg gcttcggcct aaaatatgca gaagacagat1259 gaggtcaaat actcagttgg ctctctttat ctcccttgcc tttatggtga aacaggggag1319 atgtgcacct ttcaggcaca gccctagttt ggcgcctgct gctccttggt tttgcctggt1379 tagactttca gtgacagatg ttggggtgtt tttgcttaga aaggtcccct tgtctcagcc1439 ttgcagggca ggcatgccag tctctgccag ttccactgcc cccttgatct ttgaaggagt1499 cctcaggccc ctcgcagcat aaggatgttt tgcaactttc cagaatctgg cccagaaatt1559 agggctcaat ttcctgattg tagtagaggt taagattgct gtgagcttta tcagataaga1619 gaccgagaga agtaagctgg gtcttgttat tccttgggtg ttggtggaat aagcagtgga1679 atttgaacaa ggaagaggag aaaagggaat tttgtcttta tggggtgggg tgattttctc1739 ctagggttat gtccagttgg ggtttttaag gcagcacaga ctgccaagta ctgttttttt1799 taaccgactg aaatcacttt gggatatttt ttcctgcaac actggaaagt tttagttttt1859 taagaagtac tcatgcagat atatatatat atatttttcc cagtcctttt tttaagagac1919 ggtctttatt gggtctgcac ctccatcctt gatcttgtta gcaatgctgt ttttgctgtt1979 agtcgggtta gagttggctc tacgcgaggt ttgttaataa aagtttgtta aaagttcaaa2039 aaaaaaaaaa aaaa 2053 2 313 PRT Homo sapiens 2 Met Ile Gly Gln LysThr Leu Tyr Ser Phe Phe Ser Pro Ser Pro Ala 1 5 10 15 Arg Lys Arg HisAla Pro Ser Pro Glu Pro Ala Val Gln Gly Thr Gly 20 25 30 Val Ala Gly ValPro Glu Glu Ser Gly Asp Ala Ala Ala Ile Pro Ala 35 40 45 Lys Lys Ala ProAla Gly Gln Glu Glu Pro Gly Thr Pro Pro Ser Ser 50 55 60 Pro Leu Ser AlaGlu Gln Leu Asp Arg Ile Gln Arg Asn Lys Ala Ala 65 70 75 80 Ala Leu LeuArg Leu Ala Ala Arg Asn Val Pro Val Gly Phe Gly Glu 85 90 95 Ser Trp LysLys His Leu Ser Gly Glu Phe Gly Lys Pro Tyr Phe Ile 100 105 110 Lys LeuMet Gly Phe Val Ala Glu Glu Arg Lys His Tyr Thr Val Tyr 115 120 125 ProPro Pro His Gln Val Phe Thr Trp Thr Gln Met Cys Asp Ile Lys 130 135 140Asp Val Lys Val Val Ile Leu Gly Gln Asp Pro Tyr His Gly Pro Asn 145 150155 160 Gln Ala His Gly Leu Cys Phe Ser Val Gln Arg Pro Val Pro Pro Pro165 170 175 Pro Ser Leu Glu Asn Ile Tyr Lys Glu Leu Ser Thr Asp Ile GluAsp 180 185 190 Phe Val His Pro Gly His Gly Asp Leu Ser Gly Trp Ala LysGln Gly 195 200 205 Val Leu Leu Leu Asn Ala Val Leu Thr Val Arg Ala HisGln Ala Asn 210 215 220 Ser His Lys Glu Arg Gly Trp Glu Gln Phe Thr AspAla Val Val Ser 225 230 235 240 Trp Leu Asn Gln Asn Ser Asn Gly Leu ValPhe Leu Leu Trp Gly Ser 245 250 255 Tyr Ala Gln Lys Lys Gly Ser Ala IleAsp Arg Lys Arg His His Val 260 265 270 Leu Gln Thr Ala His Pro Ser ProLeu Ser Val Tyr Arg Gly Phe Phe 275 280 285 Gly Cys Arg His Phe Ser LysThr Asn Glu Leu Leu Gln Lys Ser Gly 290 295 300 Lys Lys Pro Ile Asp TrpLys Glu Leu 305 310 3 132 DNA Homo sapiens CDS (1)..(132) 3 atg atc ggccag aag acg ctc tac tcc ttt ttc tcc ccc agc ccc gcc 48 Met Ile Gly GlnLys Thr Leu Tyr Ser Phe Phe Ser Pro Ser Pro Ala 1 5 10 15 agg aag cgacac gcc ccc agc ccc gag ccg gcc gtc cag ggg acc ggc 96 Arg Lys Arg HisAla Pro Ser Pro Glu Pro Ala Val Gln Gly Thr Gly 20 25 30 gtg gct ggg gtgcct gag gaa agc gga gat gcg gcg 132 Val Ala Gly Val Pro Glu Glu Ser GlyAsp Ala Ala 35 40 4 44 PRT Homo sapiens 4 Met Ile Gly Gln Lys Thr LeuTyr Ser Phe Phe Ser Pro Ser Pro Ala 1 5 10 15 Arg Lys Arg His Ala ProSer Pro Glu Pro Ala Val Gln Gly Thr Gly 20 25 30 Val Ala Gly Val Pro GluGlu Ser Gly Asp Ala Ala 35 40 5 105 DNA Homo sapiens CDS (1)..(105) 5atg ggc gtc ttc tgc ctt ggg ccg tgg ggg ttg ggc cgg aag ctg cgg 48 MetGly Val Phe Cys Leu Gly Pro Trp Gly Leu Gly Arg Lys Leu Arg 1 5 10 15acg cct ggg aag ggg ccg ctg cag ctc ttg agc cgc ctc tgc ggg gac 96 ThrPro Gly Lys Gly Pro Leu Gln Leu Leu Ser Arg Leu Cys Gly Asp 20 25 30 cacttg cag 105 His Leu Gln 35 6 35 PRT Homo sapiens 6 Met Gly Val Phe CysLeu Gly Pro Trp Gly Leu Gly Arg Lys Leu Arg 1 5 10 15 Thr Pro Gly LysGly Pro Leu Gln Leu Leu Ser Arg Leu Cys Gly Asp 20 25 30 His Leu Gln 357 1399 DNA Homo sapiens 7 tcaaagctca ctacagctca gaccctctgg cctcaagcgatcctccagcc tgggcctccc 60 aaagcgctag gattacaggc gtgggccacc gcgcctgaccagtcttctct tcttgcagct 120 gagccttaag agcctgtcca aagagcagag gtgggctgaaggcacaaagc gaatgaaaga 180 ataggccccc gggcaccgtt gcacgcccca cctcctcccaggggcgttgc actccagccc 240 ctcccgcaca tgcgcactgg gccttccacc gccccccgcccccagcaaag ccccccgctc 300 ggagcatgcg cgggccgctt ggcgccaatt gctgaccgccacagccacag ccagggctag 360 cctcgccggt tcccgggtgg cgcgcgttcg ctgcctcctcagctccagga tgatcggcca 420 gaagacgctc tactcctttt tctcccccag ccccgccaggaagcgacacg cccccagccc 480 cgagccggcc gtccagggga ccggcgtggc tggggtgcctgaggaaagcg gagatgcggc 540 ggtgaggcgc ggcttgggcc ggggctaggg ggtgaagggggaggaaggcg gtgggccccg 600 cctgacggag ggcgtgcagg atcgcgcctc tgactcggtaaacccgggct ccgctttcca 660 aatagcctcc acgtgttcaa aatagccgcc gctgtcccccatgggccgcc atgctaaagg 720 gccagccaat gggaacgcgt ctcggggccc atggcgccaatccgcgcgcc gcaggccctc 780 ctggctcggt gcgctgtcca atcagagggg agagggggcgggacccagag ggaggttttt 840 tgccgcgaaa agaccacgtg gggacgcggt ggggcgggtctggcgggggc ggggcacctc 900 tgtgcagggt tcccagtcac cgcgacgctc ctcgggaagccatagggcgc ctcccagccc 960 gtctccccgc tccagtttag aacctaattc ccaattcccgaccgggccca gccctgggct 1020 cttactgtcc gcttttgctg ggacctgttc cacaaatgggcgtcttctgc cttgggccgt 1080 gggggttggg ccggaagctg cggacgcctg ggaaggggccgctgcagctc ttgagccgcc 1140 tctgcgggga ccacttgcag gccatcccag ccaagaaggccccggctggg caggaggagc 1200 ctgggacgcc gccctcctcg ccgctgagtg ccgagcagttggaccggatc cagaggaaca 1260 aggccgcggc cctgctcaga ctcgcggccc gcaacgtgcccgtgggcttt ggagagagct 1320 ggaagaagca cctcagcggg gagttcggga aaccgtattttatcaaggta aatatggaaa 1380 tgcaccttcc ataagggta 1399 8 304 PRT Homosapiens 8 Met Gly Val Phe Cys Leu Gly Pro Trp Gly Leu Gly Arg Lys LeuArg 1 5 10 15 Thr Pro Gly Lys Gly Pro Leu Gln Leu Leu Ser Arg Leu CysGly Asp 20 25 30 His Leu Gln Ala Ile Pro Ala Lys Lys Ala Pro Ala Gly GlnGlu Glu 35 40 45 Pro Gly Thr Pro Pro Ser Ser Pro Leu Ser Ala Glu Gln LeuAsp Arg 50 55 60 Ile Gln Arg Asn Lys Ala Ala Ala Leu Leu Arg Leu Ala AlaArg Asn 65 70 75 80 Val Pro Val Gly Phe Gly Glu Ser Trp Lys Lys His LeuSer Gly Glu 85 90 95 Phe Gly Lys Pro Tyr Phe Ile Lys Leu Met Gly Phe ValAla Glu Glu 100 105 110 Arg Lys His Tyr Thr Val Tyr Pro Pro Pro His GlnVal Phe Thr Trp 115 120 125 Thr Gln Met Cys Asp Ile Lys Asp Val Lys ValVal Ile Leu Gly Gln 130 135 140 Asp Pro Tyr His Gly Pro Asn Gln Ala HisGly Leu Cys Phe Ser Val 145 150 155 160 Gln Arg Pro Val Pro Pro Pro ProSer Leu Glu Asn Ile Tyr Lys Glu 165 170 175 Leu Ser Thr Asp Ile Glu AspPhe Val His Pro Gly His Gly Asp Leu 180 185 190 Ser Gly Trp Ala Lys GlnGly Val Leu Leu Leu Asn Ala Val Leu Thr 195 200 205 Val Arg Ala His GlnAla Asn Ser His Lys Glu Arg Gly Trp Glu Gln 210 215 220 Phe Thr Asp AlaVal Val Ser Trp Leu Asn Gln Asn Ser Asn Gly Leu 225 230 235 240 Val PheLeu Leu Trp Gly Ser Tyr Ala Gln Lys Lys Gly Ser Ala Ile 245 250 255 AspArg Lys Arg His His Val Leu Gln Thr Ala His Pro Ser Pro Leu 260 265 270Ser Val Tyr Arg Gly Phe Phe Gly Cys Arg His Phe Ser Lys Thr Asn 275 280285 Glu Leu Leu Gln Lys Ser Gly Lys Lys Pro Ile Asp Trp Lys Glu Leu 290295 300 9 295 PRT Mus sp. 9 Met Gly Val Leu Gly Arg Arg Ser Leu Arg LeuAla Arg Arg Ala Gly 1 5 10 15 Leu Arg Ser Leu Thr Pro Asn Pro Asp SerAsp Ser Arg Gln Ala Ser 20 25 30 Pro Ala Lys Lys Ala Arg Val Glu Gln AsnGlu Gln Gly Ser Pro Leu 35 40 45 Ser Ala Glu Gln Leu Val Arg Ile Gln ArgAsn Lys Ala Ala Ala Leu 50 55 60 Leu Arg Leu Ala Ala Arg Asn Val Pro AlaGly Phe Gly Glu Ser Trp 65 70 75 80 Lys Gln Gln Leu Cys Gly Glu Phe GlyLys Pro Tyr Phe Val Lys Leu 85 90 95 Met Gly Phe Val Ala Glu Glu Arg AsnHis His Lys Val Tyr Pro Pro 100 105 110 Pro Glu Gln Val Phe Thr Trp ThrGln Met Cys Asp Ile Arg Asp Val 115 120 125 Lys Val Val Ile Leu Gly GlnAsp Pro Tyr His Gly Pro Asn Gln Ala 130 135 140 His Gly Leu Cys Phe SerVal Gln Arg Pro Val Pro Pro Pro Pro Ser 145 150 155 160 Leu Glu Asn IlePhe Lys Glu Leu Ser Thr Asp Ile Asp Gly Phe Val 165 170 175 His Pro GlyHis Gly Asp Leu Ser Gly Trp Ala Arg Gln Gly Val Leu 180 185 190 Leu LeuAsn Ala Val Leu Thr Val Arg Ala His Gln Ala Asn Ser His 195 200 205 LysGlu Arg Gly Trp Glu Gln Phe Thr Asp Ala Val Val Ser Trp Leu 210 215 220Asn Gln Asn Leu Ser Gly Leu Val Phe Leu Leu Trp Gly Ser Tyr Ala 225 230235 240 Gln Lys Lys Gly Ser Val Ile Asp Arg Lys Arg His His Val Leu Gln245 250 255 Thr Ala His Pro Ser Pro Leu Ser Val Tyr Arg Gly Phe Leu GlyCys 260 265 270 Arg His Phe Ser Lys Ala Asn Glu Leu Leu Gln Lys Ser GlyLys Lys 275 280 285 Pro Ile Asn Trp Lys Glu Leu 290 295 10 306 PRT Mussp. 10 Met Ile Gly Gln Lys Thr Leu Tyr Ser Phe Phe Ser Pro Thr Pro Thr 15 10 15 Gly Lys Arg Thr Thr Arg Ser Pro Glu Pro Val Pro Gly Ser Gly Val20 25 30 Ala Ala Glu Ile Gly Gly Asp Ala Val Ala Ser Pro Ala Lys Lys Ala35 40 45 Arg Val Glu Gln Asn Glu Gln Gly Ser Pro Leu Ser Ala Glu Gln Leu50 55 60 Val Arg Ile Gln Arg Asn Lys Ala Ala Ala Leu Leu Arg Leu Ala Ala65 70 75 80 Arg Asn Val Pro Ala Gly Phe Gly Glu Ser Trp Lys Gln Gln LeuCys 85 90 95 Gly Glu Phe Gly Lys Pro Tyr Phe Val Lys Leu Met Gly Phe ValAla 100 105 110 Glu Glu Arg Asn His His Lys Val Tyr Pro Pro Pro Glu GlnVal Phe 115 120 125 Thr Trp Thr Gln Met Cys Asp Ile Arg Asp Val Lys ValVal Ile Leu 130 135 140 Gly Gln Asp Pro Tyr His Gly Pro Asn Gln Ala HisGly Leu Cys Phe 145 150 155 160 Ser Val Gln Arg Pro Val Pro Pro Pro ProSer Leu Glu Asn Ile Phe 165 170 175 Lys Glu Leu Ser Thr Asp Ile Asp GlyPhe Val His Pro Gly His Gly 180 185 190 Asp Leu Ser Gly Trp Ala Arg GlnGly Val Leu Leu Leu Asn Ala Val 195 200 205 Leu Thr Val Arg Ala His GlnAla Asn Ser His Lys Glu Arg Gly Trp 210 215 220 Glu Gln Phe Thr Asp AlaVal Val Ser Trp Leu Asn Gln Asn Leu Ser 225 230 235 240 Gly Leu Val PheLeu Leu Trp Gly Ser Tyr Ala Gln Lys Lys Gly Ser 245 250 255 Val Ile AspArg Lys Arg His His Val Leu Gln Thr Ala His Pro Ser 260 265 270 Pro LeuSer Val Tyr Arg Gly Phe Leu Gly Cys Arg His Phe Ser Lys 275 280 285 AlaAsn Glu Leu Leu Gln Lys Ser Gly Lys Lys Pro Ile Asn Trp Lys 290 295 300Glu Leu 305

What is claimed is:
 1. An isolated cytosine DNA glycosylase (CDG)capable of releasing cytosine bases from single stranded (ss) DNA anddouble stranded (ds) DNA obtainable by modification of a uracil-DNAglycosylase (UDG), wherein asparagine (Asn) at amino acid position 204in the human UDG protein encoded by the human uracil nucleic acidglycosylase gene (UNG1) (SEQ ID NO: 8), or by the alternatively splicedhuman uracil nucleic acid glycosylase gene (UNG2) (SEQ ID NO: 2), or anequivalent residue in a homologous UDG of another species, is replacedwith an aspartic acid residue (Asp).
 2. A cytosine-DNA glycosylase (CDG)as claimed in claim 1 wherein said CDG is capable of releasing bothcytosine and uracil bases from ssDNA and dsDNA.
 3. A CDG as claimed inclaim 1 wherein said UDG which is modified is human.
 4. A DNAglycosylase as claimed in claim 1, which additionally comprises at leastone nuclear localization peptide sequence encoded by a nucleic acidmolecule comprising the sequence ATGATCGGCC AGAAGACGCT CTACTCCTTTTTCTCCCCCA GCCCCGCCAG (SEQ. ID. No.: 3) GAAGCGACAC GCCCCCAGCC CCGAGCCGGCCGTCCAGGGG ACCGGCGTGG CTGGGGTGCC TGAGGAAAGC GGAGATGCGG CG

or a fragment thereof encoding a functional equivalent or a sequencewhich is degenerate, or which hybridizes under conditions of wash at2×SSC, 65° C. with any such aforesaid sequence or at least onemitochondrial localization peptide sequence encoded by a nucleic acidmolecule comprising the sequence ATGGGCGTCT TCTGCCTTGG GCCGTGGGGGTTGGGCCGGA AGCTGCGGAC       (SEQ. ID. No.: 5) GCCTGGGAAG GGGCCGCTGCAGCTCTTGAG CCGCCTCTGC GGGGACCACT TGCAG

or a fragment thereof encoding a functional equivalent or a sequencewhich is degenerate, or which hybridizes under conditions of wash at2×SSC, 65° C. with any such aforesaid sequence.
 5. A DNA glycosylase asclaimed in claim 4 wherein said nuclear localizing peptide includes theamino acid sequence wherein RKR is followed by histidine.
 6. A method ofperforming enzymatic DNA sequencing to determine the position ofcytosine bases comprising the step of treating said DNA with at leastone CDG as defined in claim
 1. 7. A method of performing in vitromutagenesis which method comprises introducing one or more DNAglycosylases of claim 1 to a complex sample comprising a nucleic acidsubstrate, wherein particular bases are removed and optionally replacedin said nucleic acid substrate.
 8. A method of performing in vivomutagenesis which method comprises introducing a DNA glycosylase ofclaim 1 into a liposome, and introducing said liposome into a suitablehost cell.
 9. A method of removing contaminating DNA from a sample priorto PCR amplification, comprising introducing one or more DNAglycosylases of claim 1 to a reaction mix, wherein contaminating DNA aredigested by said one or more DNA glycosylases, and inactivating said oneor more DNA glycosylases prior to addition of a DNA sample andamplification.
 10. A method of producing randomly fragmented DNAcomprising introducing one or more DNA glycosylases of claim 1 to a DNAsample to yield one or more apyrimidic site(s) (AP-site(s)) in saidsample, and treating said sample with an alkaline solution alone, anapurinic/apyrimidinic-site endonuclease (AP-endonuclease), or acombination thereof, wherein breaks are produced in the DNA at theAP-site(s) which yield randomly fragmented DNA of defined size ranges.